Speech synthesis without a phone inventory
نویسندگان
چکیده
In speech synthesis the unit inventory is decided using phonological and phonetic expertise. This process is resource intensive and potentially sub-optimal. In this paper we investigate how acoustic clustering, together with lexicon constraints, can be used to build a self-organised inventory. Six English speech synthesis systems were built using two frameworks, unit selection and parametric HTS for three inventory conditions: 1) a traditional phone set, 2) a system using orthographic units, and 3) a self-organised inventory. A listening test showed a strong preference for the classic system, and for the orthographic system over the self-organised system. Results also varied by letter to sound complexity and database coverage. This suggests the self-organised approach failed to generalise pronunciation as well as introducing noise above and beyond that caused by orthographic sound mismatch.
منابع مشابه
Phonetically enriched labeling in unit selection TTS synthesis
Unit selection techniques have improved the quality of textto-speech (TTS) synthesis. However, mistakes which had been less noticeable previously in poorer quality synthetic speech become very noticeable in more natural-sounding synthetic speech. Many problems appear to be caused by mismatches between phones requested by the TTS frontend and phones selected from the labeled speech inventory. Gi...
متن کاملPhrase splicing and variable substitution using the IBM trainable speech synthesis system
This paper describes a phrase splicing and variable substitution system which offers an intermediate form of automated speechproduction lying in-between the extremes of recorded utterance playback and full Text-to-Speech synthesis. The system incorporates a trainable speech synthesiser and an application specific set of pre-recorded phrases. The text to be synthesised is converted to a phone se...
متن کاملHybrid syllable/triphone speech synthesis
In this paper, the syllable, an alternative phonetic unit to the phone, is researched in the context of speech synthesis. Several approaches to syllable modelling within the statistical approach (using hidden Markov models) to the acoustic unit inventory creation are proposed and evaluated. To be able to synthesize an arbitrary text, the syllable inventories were supplemented with triphones res...
متن کاملConstruction of the acoustic inventory for a Greek text-to-speech concatenative synthesis system
The development of the Greek Text-To-Speech (TTS) system by NTUA is based on the method of concatenative synthesis and follows the Bell Labs approach to this technique. Concatenative synthesis is one of the simplest methods for speech synthesis and at the same time bypasses most of the problems encountered by articulatory and formant synthesis techniques. The method relies on designing and crea...
متن کاملSynthesizing fast speech by implementing multi-phone units in unit selection speech synthesis
This paper presents a new approach to synthesizing fast speech in unit selection synthesis. After recording two inventories one at normal and one at fast speech rate articulated as accurately as possible speech was synthesized from both corpora independently. Since fast speech differs from normal rate speech in terms of acoustic characteristics, the concept of multi-phone (phoxsy) units propose...
متن کامل